On the Interaction of Tiling and Automatic Parallelization
نویسندگان
چکیده
Iteration space tiling is a well-explored programming and compiler technique to enhance program locality. Its performance benefit appears obvious, as the ratio of processor versus memory speed increases continuously. In an effort to include a tiling pass into an advanced parallelizing compiler, we have found that the interaction of tiling and parallelization raises unexplored issues. Applying existing, sequential tiling techniques, followed by parallelization, leads to performance degradation in many programs. Applying tiling after parallelization without considering parallel execution semantics may lead to incorrect programs. Doing so conservatively, also introduces overhead in some of the measured programs. In this paper, we present an algorithm that applies tiling in concert with parallelization. The algorithm avoids the above negative effects. Our paper also presents the first comprehensive evaluation of tiling techniques on compiler-parallelized programs. Our tiling algorithm improves the SPEC CPU95 floating-point programs by up to 21% over non-tiled versions (4.9% on average) and the SPEC CPU2000 Fortran 77 programs up to 49% (11% on average). Notably, in about half of the benchmarks, tiling does not have a significant effect.
منابع مشابه
IST ACOTES Project Deliverable D3.4 Task-Level Optimization Prototype
This deliverable describes the task-level optimization libraries designed for the ACOTES tool chain. Based on the polyhedral model of compilation, these libraries support automatic parallelization and adaptation of thread-level parallelism to a target. The libraries can also be applied to loop nest optimization of single-threaded code. Early experimental validation is described in the document,...
متن کاملAutomatic Transformations for Effective Parallel Execution on Intel Many Integrated Core
We demonstrate in this work the potential effectiveness of a source-to-source framework for automatically optimizing a sub-class of affine programs on the Intel Many Integrated Core Architecture. Data locality is achieved through complex and automated loop transformations within the polyhedral framework to enable parallel tiling, and the resulting tiles are processed by an aggressive automatic ...
متن کاملA Linear Algebraic View of Loop Transformations and Their Interaction
Although optimizing transformations have been studied for over two decades, the interactions between them is not well understood. This is particularly important for the success of parallelizing compilers. In order to deal with interactions, we view loop transformations as multiplication by a suitable matrix. The transformations considered are loop interchange, permutation, reversal, hyperplane ...
متن کاملDelivering High Performance to Parallel Applications Using Advanced Scheduling
This paper presents a complete framework for the parallelization of nested loops by applying tiling transformation and automatically generating MPI code allowing for an advanced scheduling scheme. In particular, under advanced scheduling scheme we consider two separate techniques: first, the application of a suitable tiling transformation, and second the overlapping of computation and communica...
متن کاملA Linear Algebraic View of Loop
Although optimizing transformations have been studied for over two decades, the interactions between them is not well understood. This is particularly important for the success of parallelizing compilers. In order to deal with interactions, we view loop transformations as multiplication by a suitable matrix. The transformations considered are loop interchange, permutation, reversal, hyperplane ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005